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Abstract. In this paper, we consider the problem of blind signal and image separation using a 
sparse representation of the images in the wavelet domain. We consider the problem in a Bayesian 
estimation framework using the fact that the distribution of the wavelet coefficients of real world im- 
ages can naturally be modeled by an exponential power probability density function. The Bayesian 
approach which has been used with success in blind source separation gives also the possibility of 
including any prior information we may have on the mixing matrix elements as well as on the hy- 
perparameters (parameters of the prior laws of the noise and the sources). We consider two cases: 
first the case where the wavelet coefficients are assumed to be i.i.d. and second the case where we 
model the correlation between the coefficients of two adjacent scales by a first order Markov chain. 
This paper only reports on the first case, the second case results will be reported in a near future 
The estimation computations are done via a Monte Carlo Markov Chain (MCMC) procedure. Some 
simulations show the performances of the proposed method. 

Keywords. Blind source separation, wavelets, Bayesian estimation, MCMC Hasting-Metropolis 
algorithm. 



INTRODUCTION 

Blind source separation (BSS) is an active area of research in signal and image pro- 
cessing. Different approaches have been proposed: Principal component analysis (PCA) 
[1], Independent factor analysis (IFA) [2, 3, 4], Independent component analysis (ICA) 
[5, 6, 7], Maximum likelihood estimation [8, 9, 10, 11, 12, 13, 14, 15] and Bayesian es- 
timation [16, 17, 18, 19, 20, 20, 21, 22]. All these methods use in general independence, 
sparsity and diversity of the sources either in time or in Fourier domain. 

Wavelets, as being a powerful tool of signal processing, have been largely used in 
many signal processing domains and particularly in signal denoising: [23, 24, 25, 26, 
27, 28]. They have also been used in inverse problems: [29, 30, 31]. The authors in these 
papers take advantage of the properties of the wavelet coefficients [29] : locality, multi- 
resolution, singularity detection, energy compaction and decorrelation. These outlined 
properties were said to be primary properties and give rise to what was described to be 
secondary properties: non-Gaussianity and persistency. 

Zibulevsky and Pearlmutter in [32] considered the problem of blind source separa- 
tion within a Bayesian framework using an over-complete sparse representation of the 
sources. They have, then, minimized an objective function assuming a known noise vari- 
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ance and an empirical estimation of the sources variances. 

In this paper, thanks to the unitary property of the wavelet transform, we transport 
the problem of BSS to the wavelet domain and propose to use the Bayesian estimation 
framework. 

According to the properties [29]: decorrelation (the wavelet coefficients of real 
world signals (images) tend to be approximately decorrelated) and non-Gaussianity 
(the wavelet coefficients have peaky, heavy tailed marginal distributions), we propose 
to model the distribution of the wavelet coefficients by a generalized exponential (GE) 
probability density function (pdf). Thus, independence and sparsity which are the main 
hypotheses of all the source separation techniques are not required for the sources them- 
selves, but rather for their wavelet coefficients. 

The Bayesian approach which has been used with success in blind source separation 
gives also the possibility of including any prior information we may have on the mixing 
matrix elements as well as on the hyperparameters (parameters of the prior laws of the 
noise and the sources) of the problem. 

In this work, we make use of the fast wavelet transform developed by Mallat [33] to 
have a non-redundant multi-scale representation. This paper is organized as follows: In 
section 2, we first present the general source separation problem using notation which 
can be used either in the ID, 2D or the m-D case. Then, we write the same problem in the 
wavelet domain and explicit our hypotheses about the prior distributions of the noise and 
wavelet coefficients. In section 3, we present the Bayesian approach and give the main 
expressions of the prior and posterior probability density functions. In section 4, first we 
give the basics of the MCMC algorithm and then apply it to our case. In section 5, we 
present a few simulation results to show the performances of the proposed method and 
give some comparison with other known and classical approaches. Finally, in section 6, 
we present our conclusions and perspectives. 



PROBLEM FORMULATION 

Blind image separation consists of estimating sources from a set of their linear mixtures. 
The observations consist of m images {X i: i = 1, . . . ,m} which are instantaneous linear 
mixtures of n unknown sources {Sj,j = 1, . . . ,n}, possibly corrupted by additive noise 

= l,...,m}: 

X = AS + £ (1) 

where A( rnxn ) is the mixing matrix. To be able to consider ID, 2D or even m-D signals, 
we assume that X i9 Sj and & contain each T samples representing either T samples of 
time series or T pixels of an image or, more generally, T voxels of an m-D signal. Thus, 
S is a (n x T) matrix and X and £ are (m x T) matrices. 

The blind source separation problem is to estimate both the mixing matrix A and the 
sources S from the data X and some assumptions about noise distribution and some 
prior knowledge of sources distributions. Different approaches have been proposed: 
Principal component analysis (PCA) [34, 35] mainly assumes the problem without 
noise and Gaussian distribution for sources, Independent component analysis (ICA) 
[34, 36] and Maximum likelihood estimation [35] assume again the problem without 



noise but different non-Gaussian distributions for sources, Factor analysis (FA) methods 
take account of the noise, but assume Gaussian priors both for the noise and the sources. 

The Bayesian approach is a generalization of FA with the possibility of any non- 
Gaussian priors for noise and sources as well as the possibility of accounting for any 
prior knowledge on the elements of the mixing matrix and the hyperparameters of the 
problem. In addition, it allows us to jointly estimate the sources S, the mixing matrix A 
and even the hyperparameters of the problem through the posterior: 

P (S,A,0\X)<xp(X\S,A,0) P (S\0) P (A\0) P (0) (2) 

We have used this approach before with different priors p(S\0) such as Gaussian [37] 
and mixture of Gaussians [38, 39]. We also used this approach in multi- spectral image 
separation in astronomy for separating the cosmological microwave background (CMB) 
from other cosmological microwave activities [40, 41, 42, 43, 44, 45]. 

In this paper, we are going to use the same Bayesian approach, but doing the sepa- 
ration taking the advantage of the independence and diversity properties of the wavelet 
domain coefficients of the sources. Noting by the vector s the T samples of one of the 
sources, by H the discrete wavelet transform matrix, and by w the complete wavelet 
coefficients of the 1-D signal we have 

s = Hw (3) 



Now, using the fact that the complete discrete wavelet transform is a linear and unitary 
operator (H l H = HH l = I), the problem of source separation can be easily trans- 
ported to the wavelet domain and written as: 

W x = AW a + W i (4) 

The main advantage of using this last equation in place of the original source separation 
problem is that we can more easily assign simple prior laws for W s than for S itself. For 
example, when S contains discontinuity or non-stationary, still its wavelet coefficients 
distribution can be modeled by a simple generalized exponential (GE) probability den- 
sity function (pdf) while it is harder to model appropriately signal samples distribution 
by a simple pdf. Indeed, it has been reported by many authors that the distribution of the 
wavelet coefficients of real world images are well modeled by a GE pdf: 

p(w\aj) = Q£(aJ) = 2ar f 1//3) ex P{-K a H (5) 

Note that (3=1 gives an exponential pdf and j3 = 2 corresponds to a Gaussian pdf. We 
are going to use this prior probability law in our Bayesian estimation framework. 

This is shown in the following figures. Figure (1) shows two images (Lena and the 
cameraman) which we will use later in our simulations. Figure (2) shows their respective 
histograms while Figure (3) shows their wavelet coefficients and Figure (4) shows the 
corresponding histograms of their wavelet coefficients. We can remark that even if the 
histograms of the image pixels are very different, the corresponding wavelet coefficients 
are similar and can be modeled easily by GE pdf, with different a and f3. For a given 



signal or image, these two parameters can be estimated using either the Maximum 
Likelihood (ML) method: 
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FIGURE 1. Lena and the cameraman images 




O 50 100 150 200 250 O 50 100 150 200 250 



FIGURE 2. Histograms of Lena and the cameraman images 

BAYESIAN FORMULATION 

In a first step, we assume that the sources and the noise wavelet coefficients are i.i.d. . 
Thus, to simplify the notation, we denote, respectively, by x(k), s(k) and the 





FIGURE 4. Histograms of the wavelet coefficients of Lena and the cameraman images 



vectors containing the wavelet coefficients of the data, the sources and the noise for 
a given index k. Thus, we have x(k) = As(k) + Hereafter, we omit the index k 
and note it only when needed. To proceed with the Bayesian approach, we have to assign 
the prior laws. In the following we assume: 

• The noise wavelet coefficients £ are assumed independent and = QS{a ev (3). 
Then 




• The wavelet coefficients s of the sources are also assumed independent and p(sj) = 
gS(a Sj ,(3 s ). Then 



p(s\ {a s .,p s }) = 




• The elements a i3 - of the mixing matrix A are assumed i.i.d. and Gaussian with mean 
values /lij and variances of-: 

p( aij ) = (27r< 3 )- 1 /2 exp |_ J_ (au ._^.)2| (8) 

Therefore, we may note by 

p(A\M,R a ) = (2n)~ mn ' 2 \R a \- 1 / 2 

6XP { ~ \ A ~ M )) tR a l ( VeCt (^ - M )) } (9) 

where M = {/%}, Vect(Af) means a vector containing the elements of the matrix 
M and 

Ra (mnxmn) = diag(o 2 n , °l 12 , ■ ■ ■ , °l mn ) 

• All the hyperparameters {\,-^) are assumed independent and assigned standard 
Gamma prior distributions p(x) =£(2,1), where: 

^"• b)= wiar t ' i ^ ) <10) 

The joint a posteriori law of the sources coefficients s, the mixing matrix A and the 
hyperparameters 6 is then given by: 

p(s,A,G\x) <xp(x\s,A,0)p(s\0)p(A\0)p(0) (11) 



where we noted all the hyperparameters I -jg-, ] by 0. 

The conditional a posteriori laws of s, A and 6 are then given by 



exp^ -Y,(\i,-\M.\/<*,y-Y,(\si\/ c »f' \ < 12 > 

i=l j=l 



i=i 

exp | -i (Vect( A - M)f R^ 1 (Vect( A - M)) j (13) 
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(15) 
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MCMC IMPLEMENTATION 



Once the expression of the joint a posteriori law p(s,A,0\x) of all the unknowns has 
been derived, we can use it to infer them. However, in general, the computation of the 
normalization factor needs a huge dimensional integration. When the MAP estimation is 
chosen, this normalization factor is not needed, but it is formally needed for other esti- 
mation rules such as the posterior mean. The MCMC algorithms are then the basic tools 
to generate samples from the posterior law. The main idea is to generate successively 
the samples from the posterior laws s (fc) ~ p(s\A { - k \0 {k \x), A (fc) ~ p(A\s l - k \ (fc) , x) 
and 6» (fe) ~ p(0\s^ k \ A( k \ x) and then estimate their expected values by averaging these 
samples. 

We use the Hasting-Metropolis algorithm combined to a Gibbs sampler to obtain an 
ergodic chain, and then approximate the ensemble expectation of any quantity Z by its 
empirical mean: 

1 N 
E(Z) w 7 -Yh(Z^) 

where {Z®} are samples from p(z\.). 

Noting that, when (3 = 2 and f3 s = 2, the posterior laws for the sources and for the 
elements of the mixing matrix are Gaussian, we can use these Gaussian as the trial (or 
instrumental) pdf. Thus, to simplify the presentation of the proposed algorithm, we give 
here the expressions of these Gaussian posterior laws: 

• The trial posterior pdf of the sources is Gaussian g(s\0, x) = N(s, R s ) with 

s = 2R s A l R-]x (16) 



and 



R s = -{A t R~]A + R- ct ])- 1 



(17) 



where 



Ra s (nxn) = diag(a ai , Q!,,, . . . 



2 ~ 2 al 



Ra e (mxm) = diag(a: ei , a 2 2 , . . . , a 2 err 

The trial posterior pdf of the mixing matrix elements is Gaussian g(Vect(A)\6, x) = 
J\f(Yect(M),R a ) with 

Vect(M) = R a (2Yect(sx t R-l) + i?; 1 Vect(M)) (18) 



and 
where 



R a ={2{E t R-]E).*C + R- a 1 ) 1 (19) 



-E(mxmn) = Wockdiag ([1 , . . . , 1] (nx i) , m) 

C (mnxmn) = blockdiag(sS( nxn) ,m) 

where blockdiag(Af ,m) stands for a m block-diagonal matrix with matrix M 
as the block elements, and A.*B stands for a point-wise multiplication of two 
matrices, i.e. C = A. * B means Cy = AijBij. 

The proposed MCMC algorithm is then the following: 

• Initialize s, A, 9 to s°, A , 0° and repeat the following steps until convergence 

• Sampling s(k), for k = 1 ... K: 

z^g(z\0,x) =Af(s,R s ) 
where s and R s are given, respectively by eq. (16) and eq. (17) and 



a (*+i)(jfe) = / z with probability p 

1 L s^(k) with probability 1 — p 



with 

p = minfl M'W'W I 9(z) 



'p(s®(k)\x(k),A,0)/ g(s®(k)) 
where p(z\x(k), A, 0) is given by eq.(12). 
Sampling A: 

z^g(z\e,x)=M(Vect(M),R a ) 
where M and R a are given, respectively by eq. (18) and eq. (19) and 

j^(t+i) _ f Mat(z) with probability p 

\ A® with probability 1 — p 

with 

. / p(z\x,s,0) I g(z) \ 

9 "v 'p(Vect(A(*))|a!,a,e)/ g(Vect(A®)) J 
where p(z\x(k), A, 0) is given by eq. (13). 



• Sampling 9,i = \, for i — 1 . . . m: 



{t+1) ~>G{a,b) 



with 



K 

J 



K 



+ 2 and b= ^ ^(fc) - [As(fc)] i | /3 + 1 



,fe=i 



Sampling 9j = -37, for j = 1 . . . n: 



with 



K 

Is 



ef +1) ^g(a,b) 



K 



+ 2 and b= (^2\ Sj (k)f s + lj \ 

k=l 



SIMULATION RESULTS 



To illustrate the performances of the proposed method, we consider two cases: a favor- 
able case where we have 2 unknown sources with 3 measured data, and a more difficult 
case where we have only two measured data. In the first case, we consider 64 x 64 pixel 
images of the two images of Figure (1) with the following rectangular mixing matrix: 



A 



0.8211 0.4053 
0.3769 0.7997 
0.4287 0.4428 



to generate the mixed images and added a white Gaussian noise of zero mean to obtain 
the data with a SNR = 30dB, where SNR is defined as being the ratio of the mixed signal 



energy to that of the noise in dB: SNR dB = 101og 10 
images obtained. 



1*11- 



Figure (5) shows the mixed 




FIGURE 5. The mixed images in the rectangular case 



We applied the proposed method directly on the mixed images where we assumed 
noise to be i.i.d. and original images to be independent and Gaussian. Then, we ac- 
counted for the local correlation between neighboring pixels through a Markovian mod- 
eling of the original images. Finally, we applied the method in the wavelet domain. 
Figure (6) shows the separated images obtained for each case. 




PSNR = 0.0910 PSNR = 0.0651 PSNR = 0.1144 




PSNR = 0.1712 PSNR = 0.1049 PSNR = 0.1306 

FIGURE 6. Estimated source images : (a) Sources assumed independent, (b) Accounting for local 
correlation in the sources, (c) Estimated sources obtained in the wavelet domain. 

We may note that in this case which is an extremely favorable case the three different 
methods give satisfactory results and it is not easy to really distinguish between these 
three methods as it can also be noted from the PSNR's of the reconstructed images 
compared to the original images. We can, however, speculate that accounting for local 
correlation of the image pixels outperforms the other two methods. 

We have also considered a second case where we have an equal number of mea- 
surements and sources (square case). The original source images where mixed with the 
following matrix: 

_ [ 0.9088 0.4928 " 
0.4172 0.8702 



and the same type of noise was added to obtained the data with a SNR = 30dB shown in 
Figure (7). 



FIGURE 7. The mixed images for the case of a square mixing matrix 



Figure (8) shows the reconstructed images by the three methods of modeling the 
source images, i.e. Gaussian i.i.d. , Gauss-Markov on pixels and GE on their wavelet 
coefficients. 
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FIGURE 8. Estimated source images : (a) Sources assumed independent, (b) Accounting for local 
correlation in the sources, (c) Estimated sources obtained in the wavelet domain. 



We should point out that we have used the following values for the initialization of 
the algorithm: 



2(0) _ 2(0) _ 2(0) _ , j Oifi = V% , (3 = 2 

~ ai ~° 2 - 1_> \aS> = (2)V/»., A = l.9 

The final estimated values obtained by averaging the last 10% samples after 5000 
iterations are the following: 

a £l = 24.0966, a e2 = 24.2096 
a Sl = 91.4272, a S2 = 83.5939 

We may also note that the estimated values of and a S2 directly from the 

original images are: 

a ei = 7.6457, a e2 = 7.2784 
a si = 96.5342, a S2 = 107.9316 

We notice that neither the noise variances nor the variance of the second image (the 
cameraman) were well estimated. We clearly notice that in Figure (8). However, the sep- 
aration of the images in the wavelet domain outperforms the separation applied directly 
to the images assuming sources to be independent and this is due to the decorrelation 
property of the wavelet transform. In fact, the wavelet transform nearly decorrelates a 
signal, thus assuming independent wavelet coefficients is more realistic than assuming 
independent signal samples. 

Figure (9) shows the rate of acceptance of the generated samples from the Gaussian 
to approximate the posterior law of the wavelet coefficients for f3 s = 1.9. 




FIGURE 9. Rate of acceptance of the samples for the wavelet coefficients along the iterations 
We also noticed that this rate of acceptance is a function of the parameter (3 S : 

p\0 as (3 S \1 

and 

p/l as fa/2 

Figure (10) shows the convergence of the elements of the matrix A and Figure (11) 
shows the convergence of the hyperparameters. 

Figure (12) shows the histograms of the original and estimated images while Figure (13) 
shows the histograms of the wavelet coefficients of the original images superposed with 
the Exponential pdf with parameter a estimated with the algorithm. 
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FIGURE 10. Convergence of the elements of A during the first 200 iterations 




a e a s 
FIGURE 11. Convergence of the hyperparameters 6: Left: a e \ and a £ 2 Right: a s \ and a s2 . 

CONCLUSIONS AND PERSPECTIVES 

In this contribution we proposed an approach to jointly estimate the mixing matrix 
and the original source images. We transported the problem to the wavelet domain 
using a Bayesian approach where the wavelet coefficients of real world images are 
naturally modeled by generalized exponential distributions. Independence of the wavelet 
coefficients of signals is more realistic than the independence of the signals themselves. 

In a first step, we assumed all the wavelet coefficients to be independent and iden- 
tically distributed and follow a GE pdf with a fixed value for its parameter f3 s while 
its second parameter is estimated during the iterations. Even if this gives satisfactory 
results, it will be better to estimate (3 S too during the iterations. 

A second point is that the choice of a Gaussian trial pdf is good when (3 S is not far 
from 2, but it seems that this choice is no more efficient when (3 S approaches 1. 

Finally, since the wavelet coefficients of real world signals (images) tend to propagate 
through scales, a future work is to put a Markovian model on the wavelet coefficients 
taking into account inter- scale correlation of the coefficients. 




FIGURE 12. The histogram of: (a) Original source images, (b) The estimated images (top: Lena image, 
bottom: The cameraman image) 




FIGURE 13. The histogram of the wavelet coefficients of the original source images superposed with 
the pdf of the estimated images of: (a) Lena image, (b) The cameraman image 
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